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1.  INTRODUCTION 

In  this  note,  we  consider  the  dependence  of  the  conditional  density  in  the 
nonlinear  filtering  problem  on  the  initial  a-priori  distribution  of  the  state.  From  a 
practical  point  of  view,  one  is  often  interested  in  knowing  how  long  does  he  have  to 
wait  to  reach  near  optimality  when  initiating  the  optimal  filter  with  the  wrong  initial 
conditions. 

Our  purpose  in  this  note  is  mainly  to  expose  the  problem.  Despite  our 
efforts,  we  have  at  the  moment  only  very  partial  results,  mainly  of  an  asymptotic 
nature,  which  will  be  explained  below.  Even  those  results,  however,  exhibit 
interesting  features:  it  turns  out  that,  at  the  limit  of  high  signal  to  noise  ratios, 
structural  properties  of  the  process  and  mainly  of  its  observations  dominate  the 
memory  length. 

The  models  we  consider  are  of  one  of  the  following  types: 

(a)  Finite  state  space  in  white  noise.  The  state  process  x  is  a 
continuous  time,  ergodic,  stationary  Markov  chains  with  values  { 1,2,. ..,k},  and 
with  an  infinitesimal  generator  G  =  { gy }  i<i  j<k,  where  if  Pij(e)  =  P(x(t+e)  =  j  l(x(t) 
=  i),  then 


P..(e)  =  1  +  g..e  +  o(e)  (1.1) 

Py(e)  =  gye  +  o(e)  j  *  i 

The  observation  process  {yt,  t>0}  satisfies 

dyt  =  h(xt)dt  +  Nfdbt  (i.2) 

where  bt  is  a  one  dimensional  Brownian  motion  independent  of  xt  and  h  = 
(h(i)}i<i<k  is  a  given  vector. 

(b)  Rational  Gaussian  process  in  white  noise.  The  state  process 
xt  £  Rn  satisfies  the  linear,  stochastic  differential  equation 


dxt  =  Axtdt  +  Bdwt,  p(xQ)  =  N(mQ,  PQ) 


(1.3) 
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with  wt  being  an  n-dimensional  Brownian  motion,  A,B  given  constant  matrices  of 
appropriate  dimensions,  and  p(xo),  the  initial  a-priori  density  of  xo,  is  normal  with 
mean  mo  and  covariance  matrix  Po-  The  observation  process  yt  £  Rm  satisfies 

dyt  =  Cxtdt  +  N^/2dbt  (1.4) 

where  here  bt  is  an  m-dimensional  Brownian  motion  independent  of  xt  and  C  is 
again  a  constant  given  matrix. 

(We  remark  that  a  natural  candidate  for  model  instead  of  (1.3),  (1.4)  is  the 
general  diffusion  nonlinear  filtering  problem;  a  remark  concerning  it  can  be  found  in 
the  end  of  section  3). 

We  define  now  the  notion  of  "memory  length".  In  both  cases,  let 
Pt°(x)=Ppo(xt  =  xlys,  0<s<t)  (1.5) 


denote  the  conditional  distribution  (density  in  case  b)  of  xt,  given  an  initial  a-priori 
distribution  (density  in  case  b)  po* 

The  "memory  length"  y  is  defined  as  follows: 


(1.6a) 


where  II  II  denotes  the  Euclidean  norm  and 

(case  b)  y =  sup  lim  sup  —  log  llx  t(mQ,  pQ)  -  x  t(mQ,  pQ)ll  (1.6b) 

eK  t  -*  <*>  t 

where  xt(mo,  po)  is  the  conditional  mean  starting  from  N(mo,  po)  and  K  is  some 
compact  set.  In  both  cases,  y  is  a  reminiscent  of  the  usual  definition  of  Lyapunov 
exponents.  In  (1.6b),  we  consider  xt  because,  the  conditional  density  being 
Gaussian,  it  best  characterizes  the  distribution.  We  could  also  allow  Po  to  change, 
for  brevity  we  do  not  do  that  here,  even  though  the  analysis  is  exactly  the  same. 

In  both  cases  (a),  (b),  as  No  °°,  y  will  tend  to  the  closest  to  zero  (in  real 
part)  negative  eigenvalue  of  G  in  case  a  and  pole  in  case  b.  As  No  0,  ("high 
signal  to  noise"),  surprisingly,  y  — >  — «>  necessarily  (this  is  however  the  case  when 
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n=m=l,  OO  in  case  (b)  and  k=2,  hi*h2  is  case  (a)).  For  case  a,  we  provide  an 
example  where  y  — >  0  as  No  — » 0,  whereas  in  case  (b),  the  complete  analysis  of 
section  3  shows  that  if  the  "transfer  function"  of  (1.3),  (1.4)  possesses  zeroes  on 
the  imaginary  axis,  y  —»  0  as  No— >  0,  and  otherwise  y  — »  — as  No  — >  0.  Thus, 
structural  properties  of  the  system  involved  determine  it's  "forgiveness"  of  initial 
mistakes,  even  in  high  signal  to  noise  ratios! 

The  remain  of  the  paper  in  organized  as  follows:  in  section  2,  case  a  is 
presented;  the  analysis  as  No  — » <»,  an  example  with  y  — » 0  as  No  0  and  the  k=2 

cases  are  considered.  The  difficulties  in  the  general  case  are  also  pointed  out.  In 
Section  3,  the  Gaussian  case  b  is  presented,  with  the  full  asymptotic  analysis  of  No 

-4  0. 

Acknowledgements.  We  wish  to  thank  Prof.  A.  Willsky  for  many  fruitful 
discussions. 
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2.  FINITE  STATE  SPACE  CASE 

In  this  section,  we  consider  case  a.  We  start  by  considering  the  'high- 
noise"  behavior: 

Theorem  1:  If  the  process  xt  is  ergodic,  there  exist  two  positive  constants  e  and 
Ko  such  that ,  for  No  >  Ko, 

lim  —  logllpj0  -  p^°  II  <  -  e  a.e.  (2.1) 

t  ->  oo  t 

Moreover,  as  No  <*>,  £  -»  ^max(G),  the  largest  non  zero  real  part  of  the 
eigenvalues  of  G. 

Proof.  Let  H  denote  the  diagonal  matrix  with  Hu  =  h(i).  Then,  from  Wonham 

[1], 


dpt  =  ptGdt  -  N01<pt,h>pt(H  -  <pt,h>I)dt 


+  N01pt(H-<pt,h>I)dyt  (22) 

Note  that  for  proving  (2.1),  it  is  enough  to  consider  the  equation  satisfied  by  the 
derivative  qt  of  pt  with  respect  to  the  initial  conditions  pQ  in  any  direction  d  = 
(di...dk).  From  (2),  one  has: 

dqt  =  qtG  dt  -  Nj,1  <qt,h>pt(H  -  <pt,h>I)dt 

-N01<pt,h>qt(H  -  <pt,h>I)dt  +  N()1<pt,h>pt<qt,h>dt 
+  Nq  qt(H  -  <pt,h>I)dyt  -  NQ1pt<qt,h>dyt  = 

=  qtG  dt  -  Nq1  <qt,h>pt(H-hI)dt  +  NQ1/2qt(H  -  h  I)dv 

-N01/2<qt,h>ptdVt 
dyt  -  <pt,  h>dt 

where  dv  = - is  the  innovation  white  noise  and  h  =  <h,p  >. 

t  M  t 
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dqt  =  qt(G  +  N0‘  A^dt  +  NQ1,2qtBtdvt  (2.3) 

qo  =  d 

where  At  and  Bt  are  two  matrix  valued  measurable  processes,  and  there  exists  a 
constant  a  depending  only  on  h  such  that: 

IIA  II  <  a 

t 

IIBt  II  <  a  all  t,  N0 

almost  surely  (II II  denotes  the  usual  norm  of  matrices). 

Note  that  the  hyperspace  E  =  {q,  <q,l>  =  0}  (where  1  =  (1,....,1)T)  is 
stable  under  G,  At  and  Bt  because 


G1  =0 


Atl  =  (hpt)(H  -  <pt,h>I)l  =  (hpt)(h  -  <pt,  h>l)  =  1 
and  if  qe  E: 

qBtl  =  q(H-<pt,h>I)l  -  <q,hxpt,l> 

=  <q,h>  -  <q,h>  =  0 


But  the  constraint  on  po; 

Spo®  =  1 

implies  that  qo  has  to  be  choosen  in  E,  and  qt  remains  in  E.  Choosing  any  fixed 
orthogonal  base  in  E,  (2.3)  can  be  rewritten  as: 

dq't  =  q|(G'  +  N^A^dt  +  NQ1/2q't  B’tdvt  (2.4) 

where  q’t  =  (q't  (l),...,q't(k-l))  is  a  representation  of  qt  in  this  base;  G',  A't  and 
B't  are  the  matrices  associated  to  the  restriction  to  E  of  the  applications  represented 
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by  G,  At  and  Bt  in  the  whole  space.  Moreover,  the  spectrum  of  G'  is  the  spectrum 
of  G  without  the  eigenvalue  zero,  and  then,  by  the  assumption  of  ergodicity,  all  the 
eigenvalues  of  G'  have  negative  real  part. 

Let  S  be  a  symetric  positive  matrix  and  denote: 

rt  =  q;sq;T 
-1/2  , 


then: 


drt  =  q^(G'S  +  SG,T  +  ^(A[S  +  SA;T  ))qjr  dt 
+  N01/2q;  (B'tS  +SB;T)qfdvt 
+  N0'q;B;sB;Tq;Tdt 

=  rtut(G's  +  stg,t  +  n^(2a;s  +  b;sb;t))u^  dt 
+  2N01/2rtutB’Suydvt 

using  Ito's  formula,  we  get: 

d  log  rt  =  ut(G’S  +  STG,T  +  Nq  (2 A’t  S  +  B;S  B  ut  dt 
-  2N0!(utB;  Su/dt  +  2N01/2utB;Su^dvt 

and  then: 

t 

j  log  rt  <  }  log  rQ  +  ±  J  us(G’S  +  SG'Vs  ds 

o 

t 

+  - f  u  (2 A'  S  +  B'SB'V1  ds 

jOJsv  s  s  s's 

0 

t 

+  -  2N;1/2  f  u  b'  SuT  dv  (2.5) 

j-OJssss  v/ 

-0 

Let  X  be  the  real  part  of  the  largest  eigenvalue  of  G'  and  choose  p  >  X,  then  the 
matrix  H  =  G’  -  pi  is  still  a  stability  matrix  and  there  exist  symetric  S  such  that: 
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HS  +  SHT  =  -I 


for  this  choice  of  S,  the  second  term  of  (2.5)  is  bounded  by  2]i  and  the  third  term  is 
smaller  than  N-2o  ci(S)  (where  ci(S)  is  a  constant  depending  only  on  S).  For  the 
last  term,  consider  the  time  change: 

t 

T,=  J  (usBXfdS 

0 

then ,  we  know  that  there  exists  a  brownian  motion  Bt  such  that: 

t 

f  uB'SuTdv  =  B 

J  S  S  S  S  Tt 
0 


and  then : 


L 

—  I  f  u  B*  SuT  dv  I  =  —  I B  I  <  o,(s) 

tjssss  t  2W 


IB 


0  t 

where  C2(S)  is  constant  depending  only  on  S.  This  proves  that  the  last  term  tends 
to  zero  as  t  tends  to  infinity  (if  xt  is  bounded,  use  the  last  equality,  if  xt  tends  to 
infinity,  use  the  last  inequality).  Finally,  we  get 


limsup  —  logrt<2p.+  N  ^(S) 

t  -*  ~  * 


and  then,  by  the  equivalence  of  the  norms  of  Rk: 


lim  sup  — logllqjl  <  p.  +  Nq1  c^S) 


t— 


which  is  arbitrarily  close  to  X  if  No  is  large  enough. 

We  consider  next  the  case  N0  — >  0.  For  k  =  2,  equation  (2.4)  is  one 
dimensional,  A't  <  0  and  therefore, 


N<^0 

y  — »  — oo 
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(2.6) 


One  is  led  to  think  that  this  situation  is  generic;  our  conjecture  is  that  indeed 
it  is,  under  suitable  "structural  conditions",  which  we  don't  know  at  this  point  to 
specify.  The  problem  is  that  eq.  (2.4)  is  a  Bilinear  equation  with  non-constant 
coefficients,  and  the  known  methods  of  computing  the  Lyapunov  exponent  fail. 
The  following  counter  example  demonstrates  clearly  that  (2.4)  does  not  hold  in 
general  without  restrictions;  actually,  in  this  example,  y  — >  0! 

Consider  the  four  state  process  with  transition  matrix: 


G  = 


f-1  1  ' 

-1  1 
-1  1 

1  -1 


(2.7) 


c.f.  fig.  1.  The  observation  vector  is 


h(x:)  =  h(x3)  =  h,  h(x2)  =  h(x4)  =  0 


(2.8) 


Note  that  the  fact  that  h(x2)  =  0  is  not  significant,  as  the  addition  of  a  d.c.  term  to 
h(x)  does  not  change  the  filter's  structure.  The  fact  that  the  observations  are  the 
same  for  different  states  is  of  extreme  significant,  as  the  following  intuitive 
argument  demonstrates:  indeed,  note  that  as  No  — >  °°,  by  theorem  1,  the  Lyapunov 
exponent  of  the  optimal  filter  converges  to  1,  and  the  conditional  distribution 
converges  to  the  stationary  distribution  regardless  of  the  initial  conditions. 
However,  as  No  — »  0,  consider  the  two  initial  conditions: 

(1, 0,0,0)  and  (0,0, 1,0) 

The  fact  that  No  ->  0  allows  us  to  track  accurately  the  transitions  in  the  system,  but 
reveals  nothing  as  to  the  initial  conditions  and  thus  the  state  estimate  will  highly 
depend  on  p(0).  Thus,  we  expect  that  the  Lyapunov  exponent  will  go  to  zero  as 
No  — >  0;  the  following  analysis  will  show  that  this  is  indeed  the  case.  For 
simplicity,  we  make  the  change  of  variables: 
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p^t) 

px(t)  +  p3(t) 


A(t)  = 


P4W 

\  P2(t)  +  P4(t) 


1_ 

2 


1 


where 


pi 

Pi+P3 


1 

2 


P! 


A(t)  reflects  the  "mismatch"  with  the  stationary  solution  pi  =  p2  =  P3  =  P4  =  1/4, 
where  A=0;  we  analyze  the  Lyapunov  exponent  of  A(t),  which  is  easily  seen  to 
have  the  same  behavior  as  that  of  x(t).  It  is  easily  seen  that  Vt  is  the  same  under 

initial  conditions  dx  or  d3  (which  correspond  to  A^O)  =  [o^)>  A^O)  =  [q^]) 

however  that,  by  an  easy  computation  based  on  (2.2),  A(t)  satisfies: 


-i/Yi  -i/yt 


A(t) 


where 


dy  =  [1  + 


(  2 

hX 

yt+l 


-1 


J 


y^]dt — —  ydv 

1/2  ituvt 


N, 


and  Yt  is  highly  oscillatory.  Analysing  directly  eq.  (3.4)  is  rather  difficult; 
fortunately  enough,  (3.5)  is  easy  to  simulate;  for 


X  =  y  lim  logllA(t)!l, 


the  results  are  summarized  in  table  1,  which  agrees  with  the  heuristic  analysis 
above. 


N0  104  1  0.2  0.1  0.05  0.02 

X  -1  -0.89  -0.53  -0.21  -0.085  -0.021 


(2.9) 


.  Note 


Table  1.  Lyapunov  exponent  as  function  of  No1/2 
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3.  THE  GAUSSIAN  CASE 


In  this  section,  we  analyze  the  "memory  length"  of  the  optimal  filter  for  the 
Kalman  filtering  problem  (Case  b).  Surprisingly  enough,  there  are  cases  where  the 
memory  length  does  not  tend  to  zero  when  the  signal  to  noise  ratio  is  high,  even  in 
the  fully  observable  case,  c.f.  below. 

We  assume  throughout  that  the  pair  (A,C)  is  observable  and  that  (A,B)  is 
stabilizable  (c.f.  [2]). 

Our  results  are  summarized  below: 

Theorem  2: 

a)  For  N0  -»  y  -»  A,max  (A),  where  A,max(A)  is  defined  as  in 
Theorem  1. 

b)  For  No  —>  0,  let  <{>(s)  =  det  (sI-A), 

H(s)  =  C(sI-A)-lB 
Let 

0  =  {0|Re  0  <  0,  <Ks)  <K-s)  det[HT(s)H(s)]  =  0} 

Then  y  — >  0max(©)>  where  0max(®)  is  the  element  of  0  with  largest  real 

part. 

Proof.  The  optimal  filter  equations  are  (c.f.  [5]): 

dx  =  Ax(dt  +  K(t)[dyt  -  Cx  tdt]  (3.1) 


where  K(t)  =  P(t)  CT/No  and 

t  t  PCTCP 

P(t)  =  AP(t)  +  P(t)A  +BB  --2L2L  (3.2) 

Under  our  assumptions,  P(t)  — »  Poo.  Note  that  (3.1)  implies  that,  if  we  denote  by 
x  t(xQ)  and  x  t(xQ)  the  output  of  the  filter  with  x  0(xQ)  =  xQ,  x  Q(x^  )  =  xj3  and  by 

At  =  x  t(xQ)  -  x  {(Xq  ),  then 


A 


P  (t)  C  C 

AA  - -  A 

1  N  1 


o 

from  which  (a)  follows  from  the  boundedness  of  P(t)  and  No  — > 
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To  see  (b),  we  consider  first  the  case  of  P(0)  =  P<x,;  in  that  case,  (b)  is  a 
rephrasing  of  [2,  theorem  4. 13].  In  the  general  case  (Pq  =  0),  let 


T  =  inf  {tl 


P(t)-P 


N„ 


A- 


P  C  C 


lie  CM  <  e  for  all  t  >  T}.  For  t  >  T  one  has 

.T„\ 


N 


o  J 


(P(t)-P)  T 
A - —  CCA 

t  M 

iN0 


=  (A-#clC)At-KtAt 


(3.6) 


and  IIK  (ll  <  e. 

By  the  argument  below  (2.4)  (taking  there  Bt  =  0,  Nq  A’=  Kt  and 
PooCTC 

G'  =  A  -  — ),  one  obtains  that 


A-- 


P  CTC 


N 


eC, 


o  } 


<\<X 


\ 


P  c  c 


N 


o  J 


+  eC1 


PooCTC 

where  Ci  depends  on  G'  and  is  independent  of  e  and  where  X.max(A  -  — ) 

PooCTC 

denotes  the  largest  real  part  of  the  eigenvalues  of  A  -  ■■  —  which  is  negative  by 


the  stability  of  the  optimal  filter  ([2]).  Taking  e  — >  0  leads  to  X  =  Xmax(A  - 
PooCTC 

-  — ■).  Taking  now  Nq  -4  0  yields  the  theorem. 


□ 


We  remark  that  the  theorem  implies  that  even  for  a  stable,  controllable  and 
fully  observable  system,  the  limiting  Lyapunov  exponent  can  approach  zero  even 
with  "good  measurements":  simply,  take  a  system  with  a  transfer  matrix  zero  on 
the  imaginary  axis. 

A  remark  on  the  general  nonlinear  filtering  problem  for  diffusions  seems  in 
order  here:  in  many  cases,  the  optimal  nonlinear  filter  is  well  approximated  when 
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No  — »  0  by  a  linear  system:  c.f.  [3],  [4].  In  those  cases,  also  the  "memory  length" 
y  will  exhibit  the  behavior  as  above.  We  omit  the  details  here. 
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